Designs Solve the on - Chip Wire Delay Problem for Future Large Integrated Caches . by Embedding a Network in the Cache , Nuca Designs Let Data Migrate

نویسندگان

  • Changkyu Kim
  • Doug Burger
  • Stephen W. Keckler
چکیده

0272-1732/03/$17.00  2003 IEEE Published by the IEEE computer Society The next generation of today’s highperformance processors incorporate large leveltwo caches on the processor die. For example, the IBM Power5 will contain a 1.92-Mbyte L2 cache, the Hewlett-Packard PA8700 will contain 2.25 Mbytes of unified on-chip cache, and the Intel Itanium2 will contain 6 Mbytes of on-chip L3 cache. Cache sizes will continue to increase as bandwidth demands on the package grow, and as smaller technologies permit more bits per square millimeter. However, increasing global wire delays across the chip will make large on-chip caches with a single, discrete hit latency undesirable in future technologies. Data residing near the processor in a large cache is much more quickly accessible than data residing far from the processor. Accessing the closest bank in a 16-Mbyte, onchip L2 cache built in a 50-nm process technology, for example, could take four cycles, whereas accessing the farthest bank might take 47 cycles. The bulk of the access time involves routing to and from the banks rather than the bank accesses themselves. Nonuniform cache access (NUCA) designs address this wire-delay problem. In this approach, a switched network allows data to migrate to different cache regions according to access frequency—that is, frequently accessed data migrates to areas closer to the processor. We propose several designs that treat the cache as a network of banks and facilitate nonuniform accesses to different physical regions. NUCA architectures offer low-latency access, increased scalability, and greater performance stability than conventional uniform access cache architectures.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

NUCA: A Non-Uniform Cache Access Architecture for Wire-Delay Dominated On-Chip Caches

This paper describes Non-Uniform Cache Access (NUCA) designs, which solve the on-chip wire delay problem for future large integrated caches. These designs embed a network into the cache itself, allowing data to migrate within the cache, clustering the working set in the cache region nearest to the processor. Today’s high performance processors incorporate large level-two (L2) caches on the proc...

متن کامل

Nonuniform Cache Architectures for Wire-Delay Dominated On-Chip Caches

0272-1732/03/$17.00  2003 IEEE Published by the IEEE computer Society The next generation of today’s highperformance processors incorporate large leveltwo caches on the processor die. For example, the IBM Power5 will contain a 1.92-Mbyte L2 cache, the Hewlett-Packard PA8700 will contain 2.25 Mbytes of unified on-chip cache, and the Intel Itanium2 will contain 6 Mbytes of on-chip L3 cache. Cach...

متن کامل

Way adaptable D-NUCA caches

Non-uniform cache architecture (NUCA) aims to limit the wire-delay problem typical of large on-chip last level caches: by partitioning a large cache into several banks, with the latency of each one depending on its physical location and by employing a scalable on-chip network to interconnect the banks with the cache controller, the average access latency can be reduced with respect to a traditi...

متن کامل

On-Chip Networks: Impact on the Performance of NUCA Caches

Non Uniform Cache Architectures (NUCA) are a new design paradigm for large last-level on-chip caches and have been introduced to deliver low access latencies in wire-delay dominated environments. Their structure is partitioned into sub-banks and the resulting access latency is a function of the physical position of the requested data. Typically, NUCA caches make use of a switched network to con...

متن کامل

An Adaptive Cache Structure for Future High-Performance Systems

On-chip cache sizes are likely to continue to grow over the next decade as working sets, available chip capacity, and memory latencies all increase. Traditional cache architectures, with fixed sizes and discrete latencies, lock one organization down at design time, which will provide inferior performance across a range of workloads. In addition, expected increases in on-chip communication delay...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2003